GPU Parallelization for Unstructured Sparse Matrix Problems with OpenMP 4.5 and OpenACC
نویسندگان
چکیده
The effective use of parallelized hardware is an important goal of today’s computer developments. Nvidia GPUs are an important footing in this context. While CUDA implemented algorithms focus on detailed optimized usage of GPU elements the pragma directive parallelization targets GPU computation for a broader community. In this paper we focus on the implementation of OpenACC and OpenMP 4.5 parallelization for Nvidia GPUs for a sparse matrix solver on unstructured discretizations. We show similarities between these methods and current performance differences. We focus also on the possibilities to force pragma directive parallelized GPU code to a specific vectorization. Finally we demonstrate the performance of these methods in a complex structured C++ implementation of the CG and the GMRES method with an algebraic multigrid as preconditioner.
منابع مشابه
A hybrid CPU-GPU parallelization scheme of variable neighborhood search for inventory optimization problems
In this paper, we study various parallelization schemes for the Variable Neighborhood Search (VNS) metaheuristic on a CPU-GPU system via OpenMP and OpenACC. A hybrid parallel VNS method is applied to recent benchmark problem instances for the multi-product dynamic lot sizing problem with product returns and recovery, which appears in reverse logistics and is known to be NP-hard. We report our f...
متن کاملTrellis: Portability across architectures with a high-level framework
The increasing computational needs of parallel applications inevitably require portability across parallel architectures, which now include heterogeneous processing resources, such as CPUs and GPUs, and multiple SIMD/SIMT widths. However, the lack of a common parallel programming paradigm that provides predictable, near-optimal performance on each resource leads to the use of low-level framewor...
متن کاملA Feasibility Study on Porting the Community Land Model onto Accelerators Using Openacc
As environmental models (such as Accelerated Climate Model for Energy (ACME), Parallel Reactive Flow and Transport Model (PFLOTRAN), Arctic Terrestrial Simulator (ATS), etc.) became more and more complicated, we are facing enormous challenges regarding to porting those applications onto hybrid computing architecture. OpenACC emerges as a very promising technology, therefore, we have conducted a...
متن کاملFast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that...
متن کاملExploring Programming Multi-GPUs using OpenMP & OpenACC-based Hybrid Model
Heterogeneous computing come with tremendous potential and is a leading candidate for scientific applications that are becoming more and more complex. Accelerators such as GPUs whose computing momentum is growing faster than ever offer application performance when compute intensive portions of an application are offloaded to them. It is quite evident that future computing architectures are movi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017